Skip to content

[skills] Improve android-reviewer skill based on eval findings#11075

Merged
jonathanpeppers merged 3 commits intomainfrom
jonathanpeppers/evaluate-android-reviewer-skill
Apr 3, 2026
Merged

[skills] Improve android-reviewer skill based on eval findings#11075
jonathanpeppers merged 3 commits intomainfrom
jonathanpeppers/evaluate-android-reviewer-skill

Conversation

@jonathanpeppers
Copy link
Copy Markdown
Member

Summary

Iterative improvements to the android-reviewer skill based on structured evaluations against 4 real PRs (3 Copilot-authored, 1 human-authored).

Changes

SKILL.md

  • Add guidance that reviews should produce inline comments even on clean PRs (💡 suggestions for code consolidation, missing tests, etc.)
  • Add prioritized checklist for diff analysis (bugs → safety → perf → missing tests → duplication → consistency → docs)
  • Clarify CI verdict consistency: internal pipeline failures still require ⚠️ Needs Changes verdict
  • Document --dry-run flag for submit_review.cs

review-rules.md

  • Rename section 5 to "Async, Cancellation & Thread Safety Patterns"
  • Add "Avoid double-checked locking — use Lazy<T> or LazyInitializer" rule with MS Learn reference
  • Add "Singleton initialization completeness" rule for publish-before-setup bugs

submit_review.cs

  • Add --dry-run flag that validates and pretty-prints review JSON without posting to GitHub
  • Use Utf8JsonWriter for AOT-compatible JSON serialization (fixes crash on .NET 10)

Evaluation Results

Tested against 4 PRs across 3 iterations:

Metric Iter 1 Iter 2 Iter 3
With-skill pass rate 100% (30/30) 97.5% (39/40)
Without-skill baseline 46.7% (14/30) 45% (18/40)
Inline comments produced 0 7
Thread-safety bug caught N/A ❌ No ✅ Yes

Key improvements:

  • Inline comments: Went from 0 to 1-2 per review, matching quality of no-skill baselines but with proper formatting (🤖 prefix, severity, rule references)
  • Thread-safety detection: New DCL/singleton rules enabled catching a real race condition in PR [TrimmableTypeMap] Fix thread safety in TrimmableTypeMap.Initialize #11049 that human reviewers also found
  • CI consistency: Codified that internal pipeline failures still block LGTM

jonathanpeppers and others added 2 commits April 3, 2026 10:20
Changes to SKILL.md:
- Add guidance that reviews should produce inline comments even on
  clean PRs (use suggestions for code consolidation, missing tests, etc.)
- Add prioritized checklist for what to look for in diff analysis
- Clarify CI verdict consistency: internal pipeline failures still
  require Needs Changes verdict
- Document --dry-run flag for submit_review.cs

Changes to review-rules.md:
- Rename section 5 to 'Async, Cancellation & Thread Safety Patterns'
- Add 'Double-checked locking requires volatile' rule with detailed
  ARM64 memory model guidance
- Add 'Singleton initialization completeness' rule for publish-before-
  setup bugs

Changes to submit_review.cs:
- Add --dry-run flag that validates and pretty-prints review JSON
  without posting to GitHub
- Use Utf8JsonWriter for AOT-compatible JSON serialization

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The volatile keyword is discouraged by Microsoft's own documentation.
Update the DCL review rule to recommend Lazy<T> or LazyInitializer
instead, with a link to the MS Learn docs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 3, 2026 15:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines the .github/skills/android-reviewer skill guidance and tooling based on evaluation feedback, aiming to improve review consistency (inline comments, CI verdict handling) and add a safer review submission workflow via a --dry-run mode.

Changes:

  • Updated SKILL.md to encourage at least one inline suggestion on non-trivial PRs, add a prioritized diff-review checklist, and clarify internal CI failure verdict handling.
  • Expanded review-rules.md with additional thread-safety guidance (avoid DCL; ensure singleton init completeness) and renamed the async section accordingly.
  • Enhanced submit_review.cs with a --dry-run option that pretty-prints validated review JSON using Utf8JsonWriter.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
.github/skills/android-reviewer/SKILL.md Adds guidance for inline comment expectations, a review priority checklist, CI verdict clarification, and documents --dry-run usage.
.github/skills/android-reviewer/references/review-rules.md Renames async section and adds new thread-safety rules (avoid DCL; singleton init completeness).
.github/skills/android-reviewer/scripts/submit_review.cs Adds --dry-run mode and uses Utf8JsonWriter to print validated JSON without posting.

Comment thread .github/skills/android-reviewer/scripts/submit_review.cs Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@jonathanpeppers jonathanpeppers merged commit 77f88ea into main Apr 3, 2026
2 of 5 checks passed
@jonathanpeppers jonathanpeppers deleted the jonathanpeppers/evaluate-android-reviewer-skill branch April 3, 2026 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants